Model-driven indexing: Indexing by ignoring content
نویسنده
چکیده
Indexing can be viewed is the process of segmenting a large information with the indices acting as descriptors for the subspaces. In the case of information retrieval systems, the indices are used to retrieve documents based on a user query, whereas in the case of knowledge navigation systems, the indices circumscribe browsable spaces of documents. In this paper, I take the position that traditional approaches to indexing are "datadriven"; an alternate "model-driven" approach to indexing has considerable promise, particularly within an enterprise setting. Traditional approaches to information retrieval assume that the most valuable indices for a document can be derived from the content of the document. While statistical techniques attempt to derive the content from statistically significant keywords, AI techniques go one step further to derive content through parsing and inferencing. In either case, the indices are completely determined by the documents in a document collection. I refer to this approach as "data-driven." When a user attempts to locate information from a document collection, the collection itself is only one part of the equation, with the user forming the other part of the equation. For the user, the effectiveness of an indexing scheme depends on how well it supports his/her goals, task requirements and level of expertise, not on abstract measures of effectiveness such as precision and recall. An alternate approach to indexing, therefore, is to start from the user’s end, deriving indices based on user’s mental models and task models. I refer to this approach as "model-driven’." In the most general case, model-driven approaches are difficult to instantiate for large document collections because of different mental and task models for different users. However, the problem takes on a different hue when we consider information retrieval and navigation within an enterprise setting. Unlike the general populace, mental models and task models exist within
منابع مشابه
تأملاتی بر نمایه سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه
Purpose: This paper presents various image indexing techniques and discusses their advantages and limitations. Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...
متن کاملمدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی
Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...
متن کاملImproved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features
Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...
متن کاملیک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملHigh-Performance Indexing of High-Volume Database Content
This research project aimed at producing a high-speed and web-based indexing model solution for high-volumes of metadata database content. Such solution would enable users to index their metadata content within reasonable periods of time. Also, it would enable them to debug and profile the entire indexing process. The produced model involves retrieving high-volume of database metadata content, ...
متن کامل